{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/monitor-models/data-drift/drift-on-aks.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Monitor data drift on models deployed to Azure Kubernetes Service \n",
        "\n",
        "In this tutorial, you will setup a data drift monitor on a toy model that predicts elevation based on a few weather factors which will send email alerts if drift is detected."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prerequisites\n",
        "If you are using an Azure Machine Learning Compute instance, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already established your connection to the AzureML Workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Check core SDK version number\n",
        "import azureml.core\n",
        "\n",
        "print('SDK version:', azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initialize Workspace\n",
        "\n",
        "Initialize a workspace object from persisted configuration."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "ws"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup training dataset and model\n",
        "\n",
        "Setup the training dataset and model in preparation for deployment to the Azure Kubernetes Service. \n",
        "\n",
        "The next few cells will:\n",
        "  * get the default datastore and upload the `training.csv` dataset to the datastore\n",
        "  * create and register the dataset \n",
        "  * register the model with the dataset\n",
        "  \n",
        "See the `config.py` script in this folder for details on how `training.csv` and `elevation-regression-model.pkl` are created. If you train your model in Azure ML using a Dataset, it will be automatically captured when registering the model from the run. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# use default datastore\n",
        "dstore = ws.get_default_datastore()\n",
        "\n",
        "# upload weather data\n",
        "dstore.upload('dataset', 'drift-on-aks-data', overwrite=True, show_progress=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Dataset\n",
        "\n",
        "# create dataset \n",
        "dset = Dataset.Tabular.from_delimited_files(dstore.path('drift-on-aks-data/training.csv'))\n",
        "# register dataset\n",
        "dset = dset.register(ws, 'drift-demo-dataset')\n",
        "# get the dataset by name from the workspace\n",
        "dset = Dataset.get_by_name(ws, 'drift-demo-dataset')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.model import Model\n",
        "\n",
        "# register the model\n",
        "model = Model.register(model_path='elevation-regression-model.pkl',\n",
        "                       model_name='elevation-regression-model.pkl',\n",
        "                       tags={'area': \"weather\", 'type': \"linear regression\"},\n",
        "                       description='Linear regression model to predict elevation based on the weather',\n",
        "                       workspace=ws,\n",
        "                       datasets=[(Dataset.Scenario.TRAINING, dset)]) # need to register the dataset with the model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create the inference config\n",
        "\n",
        "Create the environment and inference config from the `myenv.yml` and `score.py` files. Notice the [Model Data Collector](https://docs.microsoft.com/azure/machine-learning/service/how-to-enable-data-collection) code included in the scoring script. This dependency is currently required to collect model data, but will be removed in the near future as data collection in Azure Machine Learning webservice endpoints is automated."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Environment\n",
        "\n",
        "# create the environment from the yml file \n",
        "env = Environment.from_conda_specification(name='deploytocloudenv', file_path='myenv.yml')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.model import InferenceConfig\n",
        "\n",
        "# create an inference config, combining the environment and entry script \n",
        "inference_config = InferenceConfig(entry_script='score.py', environment=env)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create the AKS compute target\n",
        "\n",
        "Create an Azure Kubernetes Service compute target to deploy the model to. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import AksCompute, ComputeTarget\n",
        "\n",
        "# Use the default configuration (you can also provide parameters to customize this).\n",
        "# For example, to create a dev/test cluster, use:\n",
        "# prov_config = AksCompute.provisioning_configuration(cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST)\n",
        "prov_config = AksCompute.provisioning_configuration()\n",
        "\n",
        "aks_name = 'drift-aks'\n",
        "\n",
        "# Create the cluster\n",
        "try:\n",
        "    aks_target = ws.compute_targets[aks_name]\n",
        "except KeyError:\n",
        "    aks_target = ComputeTarget.create(workspace = ws,\n",
        "                                      name = aks_name,\n",
        "                                      provisioning_configuration = prov_config)\n",
        "\n",
        "    # Wait for the create process to complete\n",
        "    aks_target.wait_for_completion(show_output = True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Deploy the model to AKS \n",
        "\n",
        "Deploy the model as a webservice endpoint. Be sure to enable the `collect_model_data` flag so that serving data is collected in blob storage for use by the data drift capability."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.webservice import AksWebservice\n",
        "\n",
        "deployment_config = AksWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, collect_model_data=True)\n",
        "service_name = 'drift-aks-service'\n",
        "\n",
        "service = Model.deploy(ws, service_name, [model], inference_config, deployment_config, aks_target)\n",
        "\n",
        "service.wait_for_deployment(True)\n",
        "print(service.state)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Run recent weather data through the webservice \n",
        "\n",
        "The below cells take the weather data of Florida from 2019-11-20 to 2019-11-12, filter and transform using the same processes as the training dataset, and runs the data through the service."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# create dataset \n",
        "tset = Dataset.Tabular.from_delimited_files(dstore.path('drift-on-aks-data/testing.csv'))\n",
        "\n",
        "df = tset.to_pandas_dataframe().fillna(0)\n",
        "\n",
        "X_features = ['latitude', 'longitude', 'temperature', 'windAngle', 'windSpeed']\n",
        "y_features = ['elevation']\n",
        "\n",
        "X = df[X_features]\n",
        "y = df[y_features]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import json\n",
        "\n",
        "data = json.dumps({'data': X.values.tolist()})\n",
        "\n",
        "data_encoded = bytes(data, encoding='utf8')\n",
        "prediction = service.run(input_data=data_encoded)\n",
        "print(prediction)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create an Azure Machine Learning Compute cluster\n",
        "\n",
        "The data drift capability needs a compute target for computing drift and other data metrics. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import AmlCompute\n",
        "\n",
        "compute_name = 'cpu-cluster'\n",
        "\n",
        "if compute_name in ws.compute_targets:\n",
        "    compute_target = ws.compute_targets[compute_name]\n",
        "    if compute_target and type(compute_target) is AmlCompute:\n",
        "        print('found compute target. just use it. ' + compute_name)\n",
        "else:\n",
        "    print('creating a new compute target...')\n",
        "    provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2', min_nodes=0, max_nodes=2)\n",
        "\n",
        "    # create the cluster\n",
        "    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
        "\n",
        "    # can poll for a minimum number of nodes and for a specific timeout.\n",
        "    # if no min node count is provided it will use the scale settings for the cluster\n",
        "    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
        "\n",
        "    # For a more detailed view of current AmlCompute status, use get_status()\n",
        "    print(compute_target.get_status().serialize())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Wait 10 minutes\n",
        "\n",
        "From the Model Data Collector, it can take up to (but usually less than) 10 minutes for data to arrive in your blob storage account. Wait 10 minutes to ensure cells below will run."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import time\n",
        "\n",
        "time.sleep(600)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create and update the data drift object"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from datetime import datetime, timedelta\n",
        "from azureml.datadrift import DataDriftDetector, AlertConfiguration\n",
        "\n",
        "services = [service_name]\n",
        "start = datetime.now() - timedelta(days=2)\n",
        "feature_list = X_features\n",
        "alert_config = AlertConfiguration(['user@contoso.com'])\n",
        "\n",
        "try:\n",
        "    monitor = DataDriftDetector.create_from_model(ws, model.name, model.version, services, \n",
        "                                                  frequency='Day', \n",
        "                                                  schedule_start=datetime.utcnow() + timedelta(days=1), \n",
        "                                                  alert_config=alert_config, \n",
        "                                                  compute_target='cpu-cluster')\n",
        "except KeyError:\n",
        "    monitor = DataDriftDetector.get(ws, model.name, model.version)\n",
        "    \n",
        "monitor"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# many monitor settings can be updated \n",
        "monitor = monitor.update(drift_threshold = 0.1)\n",
        "\n",
        "monitor"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Run the monitor on today's scoring data\n",
        "\n",
        "Perform a data drift run on the data sent to the service earlier in this notebook. If you set your email address in the alert configuration and the drift threshold <=0.1 you should recieve an email alert to drift from this run.\n",
        "\n",
        "Wait for the run to complete before getting the results. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "now = datetime.utcnow()\n",
        "target_date = datetime(now.year, now.month, now.day)\n",
        "run = monitor.run(target_date, services, feature_list=feature_list, compute_target='cpu-cluster')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "time.sleep(1200)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Get and view results and metrics\n",
        "\n",
        "For enterprise workspaces, the UI in the Azure Machine Learning studio can be used. Otherwise, the metrics can be queried in Python and plotted. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# The run() API initiates a pipeline run for each service in the services list. \n",
        "# Here we retrieve the individual service run to get its output results and metrics. \n",
        "\n",
        "child_run = list(run.get_children())[0]\n",
        "results, metrics = monitor.get_output(run_id=child_run.id)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "drift_plots = monitor.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Enable the monitor's pipeline schedule\n",
        "\n",
        "Turn on a scheduled pipeline which will anlayze the serving dataset for drift. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "monitor.enable_schedule()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next steps\n",
        "\n",
        "  * See [our documentation](https://aka.ms/datadrift/aks) or [Python SDK reference](https://docs.microsoft.com/python/api/overview/azure/ml/intro)\n",
        "  * [Send requests or feedback](mailto:driftfeedback@microsoft.com) on data drift directly to the team\n",
        "  * Please open issues with data drift here on GitHub or on StackOverflow if others are likely to run into the same issue"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "jamgan"
      }
    ],
    "category": "tutorial",
    "compute": [
      "Remote"
    ],
    "datasets": [
      "NOAA"
    ],
    "deployment": [
      "AKS"
    ],
    "exclude_from_index": false,
    "framework": [
      "Azure ML"
    ],
    "friendly_name": "Data drift on aks",
    "index_order": 1.0,
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.4"
    },
    "star_tag": [
      "featured"
    ],
    "tags": [
      "Dataset",
      "Timeseries",
      "Drift"
    ],
    "task": "Filtering"
  },
  "nbformat": 4,
  "nbformat_minor": 4
}